A Distance-based Separability Measure for Internal Cluster Validation
نویسندگان
چکیده
To evaluate clustering results is a significant part of cluster analysis. Since there are no true class labels for in typical unsupervised learning, many internal validity indices (CVIs), which use predicted and data, have been created. Without labels, to design an effective CVI as difficult create method. And it crucial more CVIs because universal that can be used measure all datasets specific methods selecting proper clusters without labels. Therefore, apply variety necessary. In this paper, we propose novel – the Distance-based Separability Index (DSI), based on data separability measure. We compared DSI with eight including studies from early Dunn (1974) most recent CVDD (2019) external ground truth, by using five algorithms 12 real 97 synthetic datasets. Results show effective, unique, competitive other CVIs. also summarized general process created rank-difference metric comparison CVIs’ results.
منابع مشابه
A Survey on Internal Validity Measure for Cluster Validation
Data Clustering is a technique of finding similar characteristics among the data set which are always hidden in nature and grouping them into groups, called as clusters. Different clustering algorithms exhibit different results, since they are very sensitive to the characteristics of original data set especially noise and dimension. The quality of such clustering process determines the purity o...
متن کاملA Separability Index for Distance-based Clustering and Classification Algorithms
We propose a separability index that quantifies the degree of difficulty in a hard clustering problem under assumptions of a multivariate Gaussian distribution for each cluster. A preliminary index is first defined and several of its properties are explored both theoretically and numerically. Adjustments are then made to this index so that the final refinement is also interpretable in terms of ...
متن کاملValidity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance
The k-means method has been shown to be effective in producing good clustering results for many practical applications. However, a direct algorithm of k-means method requires time proportional to the product of number of patterns and number of clusters per iteration. This is computationally very expensive especially for large datasets. The main disadvantage of the k-means algorithm is that the ...
متن کاملA Compression Based Distance Measure for Texture
The analysis of texture is an important subroutine in application areas as diverse as biology, medicine, robotics and forensic science. While the last three decades have seen extensive research in algorithms to measure texture similarity, almost all existing methods require the careful setting of many parameters. There are many problems associated with a surfeit of parameters, the most obvious ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Artificial Intelligence Tools
سال: 2022
ISSN: ['1793-6349', '0218-2130']
DOI: https://doi.org/10.1142/s0218213022600053